Combining WEO and Penn World Table Data in R

Introduction

In this tutorial, we will go through the steps to combine two significant economic datasets: the World Economic Outlook (WEO) database and the Penn World Table (PWT) dataset. Both datasets offer a wealth of information on various economic indicators but use different systems for country codes. We will harmonize these codes to merge the datasets for selected variables and countries.

Prerequisites

Before we begin, ensure that you have R installed and the following libraries ready:

  • haven: for reading and writing Stata files.
  • dplyr: for data manipulation.
  • countrycode: for converting country codes.

Load the libraries by running the following commands:

library(haven)
library(dplyr)
library(countrycode)
library(here)

Reading the Data

Let’s start by reading the WEO and PWT datasets. The data is assumed to be in Stata format (.dta files) and located within a databases folder in your working directory.

weo <- read_dta(here("databases/WEOApr2023Pub.dta"))
penn <- read_dta(here("databases/pwt100.dta"))

Selecting and Filtering Variables

Due to the large size of the databases, we will focus on a few key variables and countries. Specifically, we will look at economic data from the Netherlands, USA, and Japan.

From the WEO dataset, we select the country name, IMF country code, year, and real GDP (ngdp_r).

weo <- weo %>%
  select(country, ifscode, year, ngdp_r) %>%
  filter(ifscode %in% c(111, 138, 158))

From the PWT dataset, we select the country code, year, and employment (emp).

penn <- penn %>%
  select(countrycode, year, emp) %>%
  filter(countrycode %in% c("USA", "NLD", "JPN"))

Standardizing Country Codes

To merge the datasets, we need to standardize the country codes. The WEO uses numeric ifscode, while PWT uses three-letter countrycode. We will convert both to the ISO three-letter codes.

# For Penn, directly assign the country codes to a new column
penn$iso3 <- penn$countrycode

# For WEO, convert IMF numeric codes to ISO three-letter codes
weo$iso3 <- countrycode(sourcevar = weo$ifscode, origin = "imf", destination = "iso3c")

In the coming future, the WEO will change to use ISO3 letter codes and you will not need to perform this conversion any more. This would, however, remain useful when merging to databases like Haver, that uses IFS codes.

Merging the Data

Now that we have a common country code system, we can merge the datasets based on these codes and the year.

combined_data <- merge(weo, penn, by = c("iso3", "year"))

Use tail() to take a peek at the last few rows of your merged dataset:

tail(combined_data)[1:7]
    iso3 year       country ifscode   ngdp_r countrycode      emp
205  USA 2014 United States     111 16932.03         USA 148.2206
206  USA 2015 United States     111 17390.30         USA 150.2485
207  USA 2016 United States     111 17680.30         USA 152.3970
208  USA 2017 United States     111 18076.65         USA 154.6723
209  USA 2018 United States     111 18609.05         USA 156.6759
210  USA 2019 United States     111 19036.05         USA 158.2996

Conclusion

You now have a single dataset that combines key economic indicators from the WEO and PWT databases, standardized to use consistent country codes. This data is ready for analysis, allowing you to investigate trends and perform comparisons across the selected countries.